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ABSTRACT: 


Conventional finite-difference migration has relied on approximations to 
the acoustic wave equation which allow energy to propagate only downwards. 
Although generally reliable, such approaches usually do not yield an accurate 
migration for geological structures with strong lateral velocity variations or 
.•/ith steeply dipping reflectors. An earlier study by D. Kosloff and E. Baysal 
( Migration with the Full Acoustic Wave Equation ) examined an alternative approach 
based on the full acoustic wave equation. The 2D, Fourier-type algorithm which 
was developed was tested by Kosloff and Baysal against synthetic data and against 
physical model data. The results indicated that such a scheme gives accurate 
migration for complicated structures. This paper describes the development and 
testing of a vectorized, 3D migration program for the CYBER 205 using the 
Koslof f/3aysal method. The program can accept as many as 65,536 zero-offset 
(stacked) traces. In order to efficiently process a data cube of such magnitude, 
(65 million data values), data motion aspects of the program employ the CDC 
supplied subroutine SLICE4, which provides high speed input/output, taking advan- 
tage of the efficiency of the system-provided subroutines Q7BUFIN and Q7BUF0UT 
and of the parallelism achievable by distributing data transfer over four differ- 
ent input/output channels. The results obtained are consistent with those of 
kosloff and Baysal. Additional Investigations, based upon the work reported in 
This paper, are in progress. 
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1.1 




In an attempt to develop a migration technique that did not have 
the faults of conventional finite-difference migration techniques, 
Kosloff and Bay sal introduced a migration technique based on the full 
acoustic wave equation [1]. While conventional finite-difference 
techniques used an approximation to the wave equation, they allowed 
energy to propagate only downwards. Although these techniques yield 
reliable migration in most cases, they usually do not yield an accurate 
migration for geological structures with strong lateral velocity 
variations or with steeply dipping reflectors. The results of the 
migration technique developed by Kosloff and Baysal showed their 
technique to be able to accurately migrate these complicated geological 
structures. Furthermore, they found that there was no need to invoke 
complicated schemes in an attempt to correct the deficiencies of 
one-way equations [2] . 
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1.2 DESCRIPTION OF TOE RESENT STODY 


Although the technique developed by Kosloff and Baysal provides an 
excellent migration algorithm, it still is a two-dimensional migration 
technique. The object of this research was to extend the 2D migration 
technique of Kosloff and Baysal into a 3D migration technique that 
would migrate a cube of 65,536 (or less) traces, each of length 1,024 
samples. This goal immediately imposed several problems that were much 
greater than extending the numerical methods of Kosloff and Baysal. Of 
these problems, execution time and data motion were the most 
significant. Although the 2D migration of Kosloff and Baysal was 
implements on a Digital Equipnent Corporation VAX-11/780 incorporating 
a FPS-lOO array processor, with favorable processing time, it was 
observed that this hardware was much too small to expect it to handle 
the 3D technique in a reasonable amount of time. Consequently, for its 
high rate of computation, the CDC CYBER 205 located at Colorado State 
University (CSU) was chosen to be the target machine. In Chapters II, 
III and IV, the following aspects of the 3D migration technique are 
developed: (1) the numerical methods involved; (2) the major features 
of the program implementing the 3D nugration technique; and (3) the 
results of numerical tests of the program. 
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2.1 INTOODUCTIQN 

Conventional finite-difference migration has relied on 
approximations to the wave equation which allow energy to propagate 
only downwards. Although generally reliable, such equations usually do 
not give accurate migration for structures with strong lateral velocity 
variations or with steep dips. The migration technique presented here 
is a three-dimensional extension of a two-dimensional migration 
technique developed earlier by Kosloff and Baysal [3] . "Hie migration 
technique presented here, referred to in this paper as the KBF 
migration technique (for Kosloff /Baysal Fourier type) , is based on the 
full acoustic wave equation, (2.1). 






( 2 . 1 ) 
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2.2 INPUT 


It is assumed that input to the KBF program consists of a "cube” 
of zero-offset traces in (x,y.z»0,t) space. Hie KBP technique 
presented here is designed to handle Nx * Ny such traces corresponding 
to Nx * Ny uniformly spaced points in the x and the y directions. The 
implementation discussed is designed so that the following must be true* 

32 <■ Nx <» 256 and Nx « 2^ for sane integer i 
32 <■ Ny <■ 256 and Ny ■ 2^ for sane integer j 

These restrictions were chosen so as to test program efficiency; 
they do not apply* in general, to the KBF schone. 

For each (x, y) pair, there will be N^ sample points in time, t^, 
m « 1, ..., Nj,, at which vzilues of pressure, P(x,y,z=0, tjjj) are given. 
N^ must edso be a power of two. 

In equation (2.1) it is assumed that the density, p» is constant 
and that the velocity function, c(x,y,z), will be provided by the 

user. For testing purposes, velocity is given by a Fortran function 
subprogram in the code presented in AH*ndix. Other forms 
representing the velocities may be used to replace the supplied 
function. 
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OUBCr CF TBE mtJBPM 

Given P(x, y, z»0, t) for t » 0, IDT, 2DT, ...» THAX 
obtain P(x, y, z, t>>0) for z * 0, lOZ, 2DZ, ...» ZMAX 

BASIC loeocx. fEmOD 

Equation (2.1) is Fourier transfonned with respect to tine, 
assianing density, p» is constant. The second order transfonned 
equations can then be reduced to a system of first order equations in 
the usual manner. If density is constant, then we can write the 
following series of equations: 


P(x,y,z,t) » F"^P(x,y,z,w) 
■ jwF"^P 



-w2f”1P 
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where 


V^p-lp + p F‘ ^ P 

■* t[l-]-[^-<' ;][!-] 

v^ere 

«.3, 

which is of the form 

^«f(z,v) (2.4) 

where 

(2.5) 
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The expression "transformed with respect to time” means that the 
functions ?(x,y»z rt^) are represented by Discrete Fourier 

Transforms: 

Nt 

^ p(x,y, (2.6) 

i»l 

(in-1) OT f or m • 1, 2 , ♦ 1 

2 

<m-(N-+l))CT for m • lit + 2, 

P is given by the Inverse Discrete Fourier Transform: 

Nt 

5(x,y,z,w^> ■ ^ ^ P<x,y,i,t„> (2.7) 

where 

(i - 1) for i « 1, 2f + 1 

2 

(i-(Nt + D) for i - ^ + 2, ..., 


w. 

1 


2ir 


DIN* 


2ir 


DIN* 


P(X,y,2 rtjjj) 


vA:ere 
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DT is the sairpling interval in time; j « ^2.* Equation (2,6) is then 

substituted in (2.1). This results in (2.2), vdiich must be satisfied 

for each w^, for i « 1, 

Thus, the parti 2 d differentiad equations which provide a 

discrete approximation to (2.1) , involving unknown functions 

P(x,y,z, t ) are replaced by ^ +1 partied differential equations 

2 

involving unknown functions P(x,y,z,W 2 ) . Note that in the transformed 
equations, dependence on time, t, has been eliminated. 

With an appropriate approximation to + 

the "classical” 4^ order Runge-Kutta algorithm is applied to integrate 
equation (2.2) numerically in z. The (vector) computation 2 d equations 
are summarized below: 

K1 ■ Dz * f(z, v^^^) 

K2«Dz*f(z + §? Vq 2 (J + 2^^ 

K3 - Dz * f (z + §? Vo 2 d 

K4 ■ Dz * f(z + Dz, Vq^^ + K3) 

''new “ ''old (K1 + 2K2 + 2K3 + K4) / 6 
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2.4 KBF DESIGN OUTLINE 


The program has four itain subdivisions, whose tasks are summarized 
below: 


Part I ; For each pair of (x,y) values, the corresponding 
zero-offset trace of P(x,y,0,t) values is converted to another "trace" 
of P(x,y,0,w) values by application of the discrete Fourier transform 
(2.7) . 


P^.t lit For each value (i=l,2,. . . ,N^) the p(x,y,0,w^) values 
are re-ordered into w^^-slices organized either sequentially in y for 
each x, or sequentially in x for each y, as appropriate for further 
transformations. 


Par t III : Each w^-slice, from the transformed input cube of 

P(x,y,0,Wj^) values (see Figure 2,1), is developed into an (x,y,z,Wj^) 
cube of P(x,y,z,Wj^) values. This development is performed by 
integrating equation (2.2) numerically. The resulting P(x,y,z,w^) 
values are accunulated for all for each (x,y,z) combination. Since 
all the related exponential multipliers equal 1 in magnitude 

(see equation (2.6)), this results in the generation of P(x,y,z,t=0) 
values, as required, (Note: t^ = 0) 
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There are two sub-problems of Part III: 


Part Tll.l : Initial values for ^^are obtained by the application 

of a two-dimensional Fourier transform to P follo/ed by multiplication 
2 

by SQRT[-1 * - V^)]. Evanescent energy components are then 

c2 - 

eliminated and is obtained by the application of a 2-dimensional 

inverse Fourier transform to 
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must be approximated four times for each Vz. This is achieved by the 

use of a two-dimensional Fourier transform, followed by multiplication 

by “(k ^ Evanescent energy is eliminated fron ^ by applying a 

X y 

two-dimensional Fourier transform to P, obtaining P. For all (K^,Ky) 
0 7 ^ 

pairs such that K + K > w./c(x,y,z), P is set to zero. Then a 
X y 1 

two-dirfensional inverse Fourier transform is applied to yield P' , which 


is input to the next step of numerical integration. Evanescent energy 




is also removed from V^in the same itanner. 


Part IV ; For each (x,y) , the P(x,y,z,t=0) values in Part III are 
retrieved so as to be contiguous in Z. These space traces are each 
Fourier transformed and the downgoing energy is eliminated by filtering 
out components with negative wave numbers The resulting filtered 
traces are inverse Fourier transformed, retaining only the real part of 
the result, which is the desired 3D depth migration. 
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3.1 PTOOPUCTION 

The speed and capacity of the computer available to an individueil 
researcher innposes certain restrictions on the types of problems that 
can be solved. The CYBER 205 's vector features and high speed scalar 
prcx:essor provide a tool for solving problems in a matter of minutes 
that would take on the order of days on a conventional scalar machine 
(this speed increase depends, to a considerable extent, on the degree 
to which it is possible to "vectorize" the scalar code) . Of the 
problems that can now be solved using the CYBER 205, the migration 
application presented here makes extensive use of the CYBEK 205' s 
vector facilities. This chapter contains an overview of vector 
processing on the CYBER 205 and an in-depth discussion of the data-flow 
required by the KBF migration algorithm. 
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This sec±ion deals prijnarily with the concept of vector machines; 
however, it is not within the scope of this paper to bring the novice 
up-to-date on vector computing. Several texts and papers have been 
written to perform that task. Hockney and Jesshope [4] present a 
comprehensive text covering vector and parallel processors as well as 
vector and parallel algorithms. Section 2.3 of Hockney and Jesshope 
[51 is dedicated to the GDC CYEE3^ 205. For more information on the 
CYBER 205, see also Kascic [61. 

THE cog CYBSl 205, HT«?inw y 

The CYBER 205, announced in 1980, replaced its predecessor, the 
CYBER 203. In turn, the CYBER 203, introduced in 1979, was a 
re-engineered version of the STAR 100. Conceived in 1964, the first 
STAR 100 became operational in 1973. The instruction set for the 
vector operations in the STAR 100 were based, primarily, on the AIL 
language. The STAR 100 was designed to execute at a rate of 100 
Mega-flops (1 Mega-flop = one million floating point instructions 
executed per second) . 


IHB GDC CSBBEt 205, EBSBai 


The CYBER 205 is a member of the family of "pipelined" machines. 
Pipeline refers to an assembly-line style of performing certain 
operations; thus more than one set of operands can be operated upon at 
a time. The vector processor of the CYBER 205 has what are knam as 
vector pipes. These vector pipes are designed to stream contiguous 
data elements (vectors) through their pipelines. Presently, the CJYBER 
205 can have as many as 'four vector pipes, all of which can operate 
concurrently. A four pipe CYBER 205, processing 32-bit words, can 
operate at a peak rate of 800 mega-flops. 

Tfie various data types utilized by the CYBER Fortran 2.0 language 
include the following: 


Type 

Comments 

Bit 

the machine is bit addressable 

Half-word 

32-bit floating point 

Full-word 

64-bit floating point; 64-bit integer 

Double-precision 

128-bit floating point 

Complex 

two consecutive 64-bit words 
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VECTOR OPERATIONS AND CONSIDERATIONS 


Vectors on the CYBER 205 are "pointed to" by vector descriptors. 
A vector descriptor is a 64-bit entity with the following two fields; 
(1) Vector length, which consists of 16 bits and (2) Virtual address of 
the first vector element, which consists of the remaining 48 bits. 
Thus, a vector can have a length ranging from 0 to 65,535. Note that a 
bit vector can be no longer than 65,535 elements even though it 
consists of only 1024 64-bit memory words. 

Vector operations come in a variety of forms on the CYBER 205, 
sane of which are displayed in Table 3.1. 

Table 3.1. Vector Operation Examples. 



DIMENSION A(IOO), B(IOO) , C(IOO) 


L = 100 


EXAKTIjE 


EQUIVALENT 

NUMBER 

VECTOR ODDE 

SCALAR ODDE 

(1) 

A(l; L) = Q8VINTL(0, 1; L) 

DO 10 I = 1, L 


10 

A(I) =1-1 

(2) 

B(l; L) = A(l; L) * 20.0 

DO 20 I = 1, L 


20 

B(I) = A(I) * 20.0 

(3) 

C(l; L) = Ad; L)*2.0+B(l; L) 

DO 30 I = 1, L 


30 

C(I)=A(I)*2.0+B(I) 
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The examples in Table 3.1 are rather simple but resemble many 
operations in scientific programs. Examples 1 emd 2 shew a vector 
function call and a vector-scalar operation. Example 3 shews a "linked 
triad" operation. A linked triad operation takes adveintage of CXBER 
205 hardware which supports such operations. As one can see in Table 
3.2, the linked triad operations are quite efficient. An operation is 
generally considered a linked triad when it consists of two vector 
operands and one scalar operand. 

In certain situations, the results of some elements of a vector 
operation need not be saved. In this case, there is a mechanism for 
avoiding storage which involves a control vector. A control vector is 
a bit vector that specifies the storage of vector results. The control 
vector will be the same length as the result vector and where it has a 
value of one the corresponding result vector element will be saved and 
where it has a value of zero the corresponding result vector element 
will not be saved. The programmer also has the choice of reversing the 
meaning of the one's and zero's in the control vector. 

A certain number of clock cycles are needed to set up the vector 
pipes. As this setup time is constant for a given operation, it is 
more efficient, in terms of total execution time, to reduce the nun±«r 
of vector operations by increasing the vector lengths whenever 
possible. Table 3.2 shows the set-up tiroes, as well as the timings for 
the actual operations for various operations on the CXBER 205. 
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Table 3.2. Vector Timing Information 


Vector Instruction 

Number of 
Set-up Cycles 

Number of 
Operating Cycles 

Addition, Subtract icn 

51 

N / 4 

Multiplication 

52 

N / 4 

Division, Square root 

80 

N / .61 

Linked triad 

84 

N / 4 


Where: 

N = Vector length 

1 Cycle = 20 nano-seconds 

The vector operations are on 32-bit words 
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3.3 


Ttye KBF migration technique is such that almost all of the 
necessary operations can be vectorized. When working with a particular 
w-slice» all of the operationsr including the two-dimensional FFT's» 
are vector operaticm. The coniputations performed at aiy given point 
of the omega-slice must be performed at all of the points. If there is 
a certain criteria that causes scroething different to occur at a given 
omega-slice point, a control vector can be created, dynamically, and 
the citation C 2 m still be performed in a vector manner. An example of 
this may be found in the routine OTTOFF where the evanescent energy is 
eliminated. In suimary, there is no 

particular operation in the KBF migration scheme that C 2 m not be 
treated as a vector operation. To emphasize this point, one should 
examine the technique presented in chapter 2 and notice that there are 
no tricky operations that would prevent vector ization. In particular, 
it is important to note that there are no operations that have the 
following structure: 

DO 100 I - 1, N 
X(I) - F(YCD) 

IF (X(I) .LT. VAL) GO TD 200 
100 CONTINUE 
200 CONTINUE 

The above code can not be efficiently vectorized because of the 
inherently sequential nature of the computations. 
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3.4 DATA O DNSIDERATIQNS 


As previously discussed, a program ijiplementing the KBF migration 
technique, extended into three dimensions, is easily expressed in terms 
of vector operations. Ihe program developed here contains very few 
scalar operations, many of which are operations needed in order to 
control various vector instructions or vector subroutine calls. Having 
such a match of software to hardware, one might conclude that there are 
no remaining barriers to running the program. There are, however, a 
few major items that one tends to overlook, being overwhelmed by the 
computational power of the CYBER 205. The greatest of these is the 
data motion required to keep the CYBER 205 vector pipes busy. 

One penalty for the use of vector operations is that the data must 
be contiguous in memory for greatest efficiency (let aloie for seme 
vector operations to run at all) . Furthermore, the vectors must reside 
in main memory as much as possible in order to prevent sure death from 
thrashing. With this in mind, one must realize that the memory 
requirement for the vectors that are necessary to perform a single step 
of the integration of one omega slice is quite large. For example, a 
(256 by 256) complex XY plane will require eleven vectors of length 
131,072 half-words. Ihese, along with various support vectors, 
comprise 12 large pages (1 large page = 65,536 full-words). This is 
slightly less than half of the memory available to a user on a 
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2-megaword 205, however it is about all one can expect to get for ary 
reasonable period in a tine-sharing envirorment. But this is really 
just the tip of the iceberg - these are just the work arrays. The 
total data set consists of the input data cube, the work arrays, and 
the output data cube. 

Osntinuing with the previous example, the input cube could very 
well be of size 256*256*1024 half-words and the output cube could be as 
much as twice the size of the input cube (the size of the output cube 
depends upon the number of ZSTEPS in the migration) . This would be a 
total of 201,326,592 half-words, which is equivalent to 1536 large 
pages. Obviously, this is much more data than any CffiER 205 can have 
in memory at any given time. Consequently, the question of how to 
handle the data-flcw arises. A solution that one may consider is to 
declare the data cubes to be huge arrays and to let the virtual memory 
mechanism handle the data cubes. 

To consider declaring the two data cubes as arrays, one must 
realize that access to these two arrays would have to be in a 
contiguous manner. Otherwise severe thrashing would result. In the 
case of the KBF migration algorithm, access to the data cubes must be 
done in several ways that would break the rule of contiguous access. 
Ihus, it would be wise to check into at least one alternate method of 
handling these data cubes as large arrays. 
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Before presenting the data notion method used in this study, the 
need for efficiency must be established. Continuing with the previous 
example and without discussing the code in detail, the subroutine RHS3 
takes on the order of 100 milli-seconds to run, each time it is called. 
In this example, RHS3 would be called on the order of 4*512*512 
(1,048,576) times. Ihe time needed for all of these calls is 
approximately 29 hours. Ihus, ary time for performing the data-motion 
is added onto the 29 hours. Therefore, one needs to find a mechanism 
to perform the data-motion without making the program run for an 
unacceptable amount of time. 
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3.5 A POUR-WaY PARArJiET. D ftTR M3TION TEXSbUDUE 


CXBER 205 Fortran provides severed routines that may be used to 
inplement I/O that runs concurrently with other instructions being 
executed as well as with other I/O. These routines include Q7BUFIN, 
Q7BUP00T, and Q7WAIT. For detailed information on these routines, see 
the dC CYBER 200 FORTRAN VERSION 2 manual [7] . A typical use for 
these routines would be as follows: 


CALL Q7BUP0UT( ) 

CALL MDRK( ) 


In this example where the programmer wishes to write information 
out to a unit and have the routine W3RK run concurrently with the I/O. 
In general, as long as WORK does not use the I/O unit referred to in 
the Q7BUP0UT call, it can do anything it wishes. Thus, there is CHJ 
activity concurrent to I/O activity. 

Another example where two I/O requests cause concurrent I/O, is as 
follcws: 


CALL Q7BUFIN( ) 

CALL Q7BUP00T( ) 
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According to the CDC OBER PORTI^AN 2 nanual 181 , these calls are 
legal, so long as they do not access the sanie data block on the same 
disk. Also, two Q7BUFIN, two Q7BUPCXJT calls. Or a Q7BUFIN and a 
Q7BUKX3T call can be active at one time for a given unit. 

It should be obvious that these "Q7" calls are the basis of a 
solution to the problem of data-flow that was presented in the previous 
section. Indeed, they are; yet they are only the basis of the method 
used in this study. Dr. Bjorn Hossberg [9] , of Control Data 
Corporation, wrote a utility known as SLICE4. Mossberg used the "Q7" 
utilities; however, the scheme he developed is much more elaborate 
than a series of Q7 calls to a particular I/O unit. 

9J:CE4 

It is not within the scope of this paper to duplicate Mossberg' s 
documentation of SLICE4. However, the concept and the terminology of 
SLICE4 will be presented as it applies to this study. For efficient 
operation, SLICE4 must be tightly integrated into the master program. 
Therefore, its terminology affects the view that one takes of the 
master program. 

In this study, two inplementations of SLICE4 were needed and used; 
one for the input data cube and one for the output data cube. To 
explain the use of SLICE4, <xily the input data cube will be treated. 
The output data cube is 'handled in a similar manner. 
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The first step in using SLICE4 is to iitpose a coordinate ^stem 
upon the data cube such that the cube is N1 by IC by N3 elements in 
size, where UL is the number of elements in what one normally considers 
the Z direction, N2 is the number of elements in the X direction, and 
N3 is the number of elements in the Y direction. *1110 next step is to 
define a second coordinate system on the data cube. Instead of being 
coordinates of individual data items, this second coordinate system 
gives coordinates of "super-blocks." Super-blocks are small cnabes of 
the original data set. The super-blcx:k cx»ordinate system has NSl 
super-blocks in the 1-direction, ^B2 in the 2-direction, and NS3 in the 
3-direc:tion, where NSl and NS2 trust be multiples of four. 1C3 does not 
have this restric:tion; however, for greatest efficiency, it should be 
one or a multiple of four. The reason for the multiple of four rule is 
that the super-blocks will reside on four different I/O units. No 
matter which direction the cube is accessed, each I/O unit will have 
one quarter of the super-blocks accessed. This is not the case when 
only a partial row or column of super-blocks is accessed; thus, it is 
most efficient to access a complete row or column. If it should happen 
that more than one I/O unit be controlled by a given controller, then 
SLICE4 will still execute, but in a less efficient manner (i.e. the 
parallelian is partially inhibited) . Thus, one may access any four 
adjacent super-blocks at a cost which is one fourth the cost of 
accessing the same data with conventional techniques. 
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The super-blocks themselves have a coordinate structure iitposed 
upon them. This coordinate structure is LI by L2 by 13. Where LI is 
the number of elements from the data cube in the 1-direction; L2 and 
L3 are defined in the same manner for their individual directions. 

Summarizing the terminology presented so far, the original data 
cube is broken up into NSl by NS2 by NS3 super-blocks. Each 
super-block has Ll by L2 by L3 data elements. Thus the following rules 
must apply; 

N1 = NSl * Ll with NSl = 4 * i, i => 1 
N2 = NS2 * L2 with NS2=4*j, j=>l 
N3 = NS3 * L3 


MXBSS 

The rows and columns of super-blocks are referred to as slices. A 
1-slice is seme column of super^locks in the 1-direction, a 2-slice is 
some row of super-blocks in the 2-direction, and a 3-slice is seme rew 
of super-blocks in the 3-direction. One may access all, or just sane, 
of the super-blocks of a slice via SLICE4. However, in this study, 
only the most efficient access is performed - accessing all 
super-blocks of a given slice. As access can be by any given slice, 
SLICE4 must have the super-blocks all formatted in the same manner. 
Thus, when accessing a given slice, the slice is written into a buffer 
by SLICE4 and the user must re-format the data frem the buffer into a 
work array in the format that corresponds to the direction of access. 
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One needs to be careful to have enou^ array and buffer space to 
access the data cube in edl the necessary directions, itius, the size 
of the super-block comes into question. The larger the super-block, 
the fewer accesses to the data cube are needed and vica versa. In this 
stuc^, the Ll dimension was set permanently to the value of 2. The 
reason for this is that, as one recalls from the migration technique, a 
complete XY plane is processed at any given time and there is only 
enough memory space to have two input planes in memory at the same 
time. 
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4.1 EXEQjnON TESTS 

As discussed in section 3.4, it would take over 29 hours of 
execution time to migrate the maximum (assumed) data cube; thus for 
testing purposes, an input cube of size (64x64x64) was used. For both 
of the test runs discussed here, all of the traces consisted oonpletely 
of zeros, except the center trace that had a single wavelet peaking at 
sanple 16 (in time) . The correctly migrated result^ in this case, 
consists of a htsinisphere. The first run (Figures 1 and 2) incorporated 
a padding in the time direction to delay the wrap-around effect 
inherent in Fourier algorithms. The second run (Figures 3 and 4 ) did 
not incorporate a padding - thus, wrap-around effects appeared. The 
first run took 240 CHJ seconds and the second run took 115 CPU seconds. 

Test Run 1 ; The migration of the input cube described above, 
using a constant velocity of 3000 m/s, a Dz interval of 6.0 meters, a 
Dx interval of 12.0 meters, a Dy interval of 12.0 meters, and a time 
interval of 4.0 milli-seconds, yields the results shewn in Figures 1 
and 2. Figures 1 and 2 are slices of the output cube in the XZ and in 
the YZ directions, respectively, intersecting at the center of the 
output cube (Note the absence of the wrap-around effect) . 


354 



Test Run 2 ; The migration of the same input cube used in Test Run 
1 using tne same sampling rates in all dimensions, but with a velocity 
interface (see Figure 3; VI ■ 4000 m/s; V2 « 3000 m/s), yields the 
results displayed in Figures 3 and 4. Note the wrap-around effect 
present in these figures. 

4.2 FACTORS AFFECTITC SPEED OF (DMPUTATIQM 

Until a superior algorithm for performing the I/O required by the 
KBP migration algorithm appears, SLKZE4 will remain the most efficient 
method available to perform the I/O task. However, should a CYBER 205 
ever be equipped with 8, or even 16, I/O channels, SLICE4 should easily 
be adapted to create SLICES and SLICE16 versions. Until then, there is 
little chance of decreasing the time required to perform the I/O. 

Other than I/O, the Runge-Kutta 4^ order algorithm employed in 
the KBF migration technique is the most expensive feature. 
Cksnsequently, use of a less costly method for numerical integration 
(e.g., a multi-point method, using the Runge-Kutta method to get 
started) might result in increased conputational efficiency. 

4.3 OQNGLUSIQNS 

'Die 3D KBF migration program, iirplemented on the CYBER 205 
Superconputer presented in this thesis, yields results that are 
consistent with those of Kosloff and Baysal [10] . This was confirmed 
by Kosloff [11] . Thus, a 3D migration program, using the KBF migration 
technique (based on the full acoustic wave equation) permitting lateral 
velocity variations is now available for use on the CYBER 205. 
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